Discrete Cosine Transform and A Case Study on Image Compression

Discrete Fourier Transform(DFT) and Discrete Cosine Transform(DCT) are commonly used algorithms to represent an arbitrary signal in terms of orhonormal basises. In DFT case, these basis functions are cosine and sinusoids(in complex form) where DCT depends on cosine signals to represent the signal. Even though DCT is more used than DFT, the approach that they use to represent signal are similar but they only differ by basis functions.

Definitions

DFT

DFT tries to represent the signal in the following form: \[ X_k = \sum_{n=0}^{N-1} x_n e^{{-2j\pi kn}{N}} \] where \[k \in Z \] and \(e^{\frac{-2j\pi kn}{N}}\) could be simplified by the Euler's formula in the following form: \[e^{\frac{-2j\pi kn}{N}} = cos(\frac{2\pi kn}{N}) + jsin(\frac{2\pi kn}{N})\]

DCT

DCT uses only cosines \[ X_k = \sum_{n=0}^{N-1} x_n cos[\frac{\pi}{N} (n + \frac{1}{2}) k] \] where \[ k = 0, \ldots, N-1 \]

Since the basis functions are real cosine functions in DCT, the signals that we are trying to compress using DCT needs to be real as well, where DFT could represent complex signals as well. Experimentally and also theoretically(with a better boundary condisitions), DCT transform yields smaller coefficients for natural images and audios. Therefore, for compression purposes, DCT is de facto method for a number of applications including JPEG and lossy compression for audios.

Let's see how one might approach the compression of images using.

DCT

In [3]:
%matplotlib inline
import io
import os
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
from scipy import fftpack
import urllib2
import IPython

I am reading the image from url using PIL and converting it into a numpy array after converted grayscale image.

Scipy provides dct out of the box in the fftpack submodule, so one can implement 2-d dct by taking dct on the rows of the image, and then get the dct of the transposed image in the same way. I am using nomr='ortho' as this corresponds to Type II dct which is when people say dct, what they refer to. If you are familiar with Matlab, this optional parameter makes what you get when you use dct(y) in Matlab, too. (Note that there are eight different variants of dct and this one is the most common, I will not mention differences, but feel free to check it in wikipedia.)

For idct, it is almost same with dct, with two i's are introduced into the function as fftpack provides very similar api for idct with dct, it becomes relatively straightforward.

In order to reconstruct the image, the image is clipped between 0 and 255 in order to have uint8 to have an image that we begin with.

In [4]:
image_url='http://i.imgur.com/8vuLtqi.png'

def get_image_from_url(image_url='http://i.imgur.com/8vuLtqi.png', size=(128, 128)):
    file_descriptor = urllib2.urlopen(image_url)
    image_file = io.BytesIO(file_descriptor.read())
    image = Image.open(image_file)
    img_color = image.resize(size, 1)
    img_grey = img_color.convert('L')
    img = np.array(img_grey, dtype=np.float)
    return img
 
def get_2D_dct(img):
    """ Get 2D Cosine Transform of Image
    """
    return fftpack.dct(fftpack.dct(img.T, norm='ortho').T, norm='ortho')

def get_2d_idct(coefficients):
    """ Get 2D Inverse Cosine Transform of Image
    """
    return fftpack.idct(fftpack.idct(coefficients.T, norm='ortho').T, norm='ortho')

def get_reconstructed_image(raw):
    img = raw.clip(0, 255)
    img = img.astype('uint8')
    img = Image.fromarray(img)
    return img
In [5]:
!rm -rf frame-imgs
In [6]:
IMG_DIR = 'frame-imgs/'

pixels = get_image_from_url(image_url=image_url, size=(256, 256))
dct_size = pixels.shape[0]
dct = get_2D_dct(pixels)
os.mkdir(IMG_DIR)

reconstructed_images = []

for ii in range(dct_size):
    dct_copy = dct.copy()
    dct_copy[ii:,:] = 0
    dct_copy[:,ii:] = 0
    
    # Reconstructed image
    r_img = get_2d_idct(dct_copy);
    reconstructed_image = get_reconstructed_image(r_img);

    # Create a list of images
    reconstructed_images.append(reconstructed_image);
    reconstructed_image.save(os.path.join(IMG_DIR, 'img_{0:04}.png'.format(ii)));

os.system("convert -delay 30 -comment '(Image Url: {})' {}img_*.png dct-ipynb.gif".format(image_url, IMG_DIR));
In [7]:
plt.figure(figsize=(16, 12));
plt.scatter(range(dct.ravel().size), np.log10(np.abs(dct.ravel())), c='#348ABD', alpha=.3);
plt.title('DCT Coefficient Amplitude vs. Order of Coefficient');
plt.xlabel('Order of DCT Coefficients');
plt.ylabel('DCT Coefficients Amplitude in log scale');
In [8]:
plt.figure(figsize=(16, 12))
plt.hist(np.log10(np.abs(dct.ravel())), bins=100, color='#348ABD', alpha=.3, histtype='stepfilled');
plt.xlabel('Amplitude of DCT Coefficients (log-scaled)');
plt.ylabel('Number of DCT Coefficients');
plt.title('Amplitude distribution of DCT Coefficients');

These graphs are important in several ways. First it tells us that large coefficients and very small coefficients is quite small number.(Note that X axis is log-scaled.) So, the trade-off between compression and image quality generally depends on the coefficients that have middle range values. It is easy to get very large coefficients and reject very small coefficients in the reconstructed image but not very easy to either include or reject middle values based on their solely amplitudes. In these coefficients, one needs to look at the frequencies that they belong to, if they are in somehow high frequency range, then it would be rejected whereas if they belong to lower frequency range, it may introduce noticeable and large artifacts into the signal. Instead of comparing to only Root Mean Squared Error(RMMS) to learn where to stop in the coefficients, one could check better metrics which consider visual fidelity or even perceived quality to find the sweet spot between compression ratio and image quality. In implementations at least for JPEG, this is done by a predetermined number. Therefore, if an image has a different pixel distributions than most of the natural images, or has very high frequency compoenents, then one could see very noticeable artifacts around the edges in the images.

Wavelet based approaches(JPEG 2000) is far better than DCT based approaches in textured images or images that have a lot of edges as they could preserve local texture better than dct-based approaches. However, JPEG2000's licence is not permissive unlike JPEG and JPEG seems pretty work well for people who are not professional photographers. Therefore, the adoption.

In [9]:
plt.matshow(np.abs(dct[:50, :50]), cmap=plt.cm.Paired);
plt.title('First 2500 coefficients in Grid');

If we look at the first first 2500 coefficients in a 50x50 grid, then we could see that a lot of the coefficients are actually very small comparing to the few very large ones. This not only provides a good compaction for the image(less coeffcients means high compaction rate), but also provides a good compromise between compression and image quality. Generally, very low frequencies have a higher ratio of magnitude orders and similar to very high frequencies.

First 64 Images

In [10]:
fig = plt.figure(figsize=(16, 16))
for ii in range(64):
    plt.subplot(8, 8, ii + 1)
    plt.imshow(reconstructed_images[ii], cmap=plt.cm.gray)
    plt.grid(False);
    plt.xticks([]);
    plt.yticks([]);

The reason why DCT is so succesful could be seen from these images as you could see first 40-50 coefficients out of 256 could capture most of the image(there are some noticeable artifacts around edges alebei). If we want to use these coefficients to reconstruct the image, we could get high compression where we do not lose much image quality in the process. Lossy compressions generally use this type of schema to get a compression rate. They discard some of the high frequency information(edges and fast transition images where they preserve the slow changes and constant colors). Since most of the signals are low frequency by nature, this approach works quite good in practice. We remove the high frequency coefficients and transmit the "important" coefficients (the coefficients that have most information about the signal).

Second 64 Images

In [11]:
fig = plt.figure(figsize=(16, 16))
for ii in range(64, 128):
    plt.subplot(8, 8, ii - 63)
    plt.imshow(reconstructed_images[ii], cmap=plt.cm.gray)
    plt.grid(False);
    plt.xticks([]);
    plt.yticks([]);

IN this set of 64 images, most of the artifacts around edges become unnoticeable as we have more and more high frequency components in the reconstructed image.

Third 64 Images

In [12]:
fig = plt.figure(figsize=(16, 16))
for ii in range(128, 192):
    plt.subplot(8, 8, ii - 127)
    plt.imshow(reconstructed_images[ii], cmap=plt.cm.gray)
    plt.grid(False);
    plt.xticks([]);
    plt.yticks([]);

Last 64 Images

In [13]:
fig = plt.figure(figsize=(16, 16))
for ii in range(64):
    plt.subplot(8, 8, ii + 1)
    plt.imshow(reconstructed_images[-ii-1], cmap=plt.cm.gray)
    plt.grid(False);
    plt.xticks([]);
    plt.yticks([]);

The last 64 images are constructed using (192-256) coefficients and they look very similar to each other, almost identical. The last image is actually the original image.

In [15]:
IPython.display.Image(url='http://www.ultraimg.com/images/Bk39G.gif')
Out[15]:

This is what it looks like when we add coefficients to reconstruct the image. Note that first coefficients are mostly color and hue differences, then it reconstructs a coarse, very approximate image. Then, with high frequency components, we could get a high quality image. You could also see the artifacts going away when the high frequency components are added to the image. Nice way to think about the compression in general.

Takeaways

  • DCT is surprisingly good for compression when it comes to natural images and even very few coefficients, you could construct the image quite well.(thanks to its energy compaction properties)
  • Since the basis functions are orthogonal(for common type, it is actually orthonormal as well), the coefficients are uncorrelated to each other and missing one coefficient does not affect the remaining ones. This provides a reliable reconstruction even if one or more coefficients are missing. In transformative coding or other coding schemas that coefficients are somehow related, the error in the communication channel may propagate to other pixels and reconstructed image and original image difference would result in quite big error even if only a few coefficients get corrupted.
  • In the receiver, one needs to only take inverse dct transform as the basis functions are same in both transmitter and receiver, transmitter does not have to send the basis functions to the receiver. This provides a common, universal coding schema where if your computer has inverse dct transform capability, you just need to download few coefficients to reconstruct the image.
  • DCT is actually applied block by block to the images and this may not be a best way to approach compression as blocks do not take the edges and texture into consideration when DCT is applied to the image.
  • In general, DCT throws aways very high frequency components so it may be better to use some other compression schema if your image has high frequency components(rich texture, lots of transitions and sharp edges)
  • DFT produces symetric signals in frequency domain for real signals. Therefore, part of the computation becomes a waste as you could reproduce the half of the coefficients from the rest of the coefficients. DCT not only produces more compact representation than DFT but also does not have this disadvantage that DFT has.
comments powered by Disqus